Human-mediated dispersal drives the spread of the spotted lanternfly (Lycorma delicatula)

The spotted lanternfly (Lycorma delicatula) is a novel invasive insect from Asia now established and spreading throughout the United States. This species is of particular concern given its ability to decimate important crops such as grapes, fruit trees, as well as native hardwood trees. Since its initial detection in Berks County, Pennsylvania in 2014, spotted lanternfly infestations have been detected in 130 counties (87 under quarantine) within Connecticut, Delaware, Indiana, Maryland, New Jersey, New York, Ohio, Virginia, and West Virginia. Compounding this invasion is the associated proliferation and widespread distribution of the spotted lanternfly’s preferred host plant, the tree-of-heaven (Ailanthus altissima). While alternate host plant species have been observed, the tree-of-heaven which thrives in disturbed and human-dominated areas (e.g., along roads and railways) is likely facilitating the population growth rates of spotted lanternfly. We simulated the population and spread dynamics of the spotted lanternfly throughout the mid-Atlantic USA to help determine areas of risk and inform continued monitoring and control efforts. We tested the prediction that spotted lanternfly spread is driven by human-mediated dispersal using agent-based models that incorporated information on its life-history traits, habitat suitability, and movement and natural dispersal behavior. Overwhelmingly, our results suggest that human-mediated dispersal (e.g., cars, trucks, and trains) is driving the observed spread dynamics and distribution of the spotted lanternfly throughout the eastern USA. Our findings should encourage future surveys to focus on human-mediated dispersal of egg masses and adult spotted lanternflies (e.g., attachment to car or transported substrates) to better monitor and control this economically and ecologically important invasive species.


Materials and methods
Maxent models. We estimated habitat and climate suitability for SLF representing a potential geographic distribution throughout the contiguous United States. We adopted similar methods from two previous studies 36,37 that used maximum entropy (Maxent) models 48,49 to estimate the potential global establishment of SLF. We used the Maxent program (ver. 3.4.1) 50 which we implemented via program R (ver. 4.0.2) 51 using the 'rJava' package (ver. 0. [9][10][11][12][13] 52 and 'dismo' package (ver. 1. [1][2][3][4] 53 . Maxent models can be used to estimate species spatial distributions on a given landscape using presence-only detection data 49 . Model solutions are computed by minimizing the relative entropy (i.e., a measure of dispersedness) within covariate space between probability densities of species presence data and landscape covariates 54 . This is the equivalent of maximizing the entropy (in geographic space) of the probability of species presence among a given set of locations 49,54 .
We used species presence-only data from locations in both native and invaded regions where SLF currently occur in Asia (i.e., China, India, Japan, Democratic People's Republic of Korea [North Korea], Republic of Korea [South Korea], and Socialist Republic of Vietnam) and North America within the United States (Fig. 1).
We combined data of SLF presence from the Global Biodiversity Information Facility 55 throughout Asia (N = 418) and annual survey data (2014-2019) from the Pennsylvania Department of Agriculture's Spotted Lanternfly Program and New York State Department of Agriculture and Markets (N = 1347). We used these data to generate 100-km circular buffers around each location which were later used to extract "background" biological and environmental covariates for use in model performance evaluation 37 .
To generate a suite of environmental variables, we used 1-km resolution covariates of global coverage that included 19 bioclimatic variables 56 and digital elevation map (DEM) data from the WorldClim2 data 57 , and presence-only data for TOH (the preferred host plant for SLF) in both Asia (N = 320) and United States (N = 4813) from the Global Biodiversity Information Facility 55 . We acknowledge that given the known host-plant flexibility www.nature.com/scientificreports/ for SLF 30,31 , that our simplified model using only TOH data to model the SLF spread would lead to conservative estimates of the more complex spread dynamics occurring in nature. Presence-only location points were converted to a 1-km resolution raster layer using the 'raster' package (ver. 3.1-5) 58 . Additionally, we included 1-km resolution human footprint data from the Last of the Wild Project (ver. 3 [LP-3]) 59 that were downloaded from: https:// sedac. ciesin. colum bia. edu/ data/ colle ction/ wilda reas-v3) representing data from 2009. These data represent a global index of human influence that are compiled from various remote-sensing data indicating human-dominated landcover, human population density, power utility infrastructure, agricultural lands, and navigable roads, railways, and waterways 59 , and were important to include given the known potential for human influence on the spread of SLF 23,60 . These human footprint data were also similarly used as a covariate within Maxent models from a previous study to estimate SLF distribution in eastern Asia 36 . Prior to fitting models, we used Pearson's r statistic 61 to test our suite of environmental covariates for multicollinearity and retained the following variables where r ≤ 0. 8 To prepare data for Maxent model fitting and prediction, we split the SLF occurrence data randomly into training (80%) and validation (20%) data sets. We used the previously generated 100-km buffers around SLF locations to extract raster values from each of the aforementioned covariates using the 'raster' package 58 . We then generated a random set of points (N = 1000) from which "background" values were extracted, and randomly assigned to training and validation data sets for use in evaluating model-based predictions of the probability of SLF occurrence 50 . We fit the Maxent model using training data and our suite of predictor covariates using the 'dismo' R package 53 . We then evaluated the model's performance with the independent validation data (20%) using the area under the receiver operating characteristic curve (AUC), true skill statistic (TSS), and kappa performance metrics 53 . Model predictions (with values ranging from 0 to 1) were interpreted as the potential likelihood or probability of SLF occurrence throughout the continental United States as a function of environmental covariates included within the model. As a final step to prepare the Maxent model-based predictions of SLF occurrence for inclusion in subsequent analyses, we generated a downsampled 4-km resolution raster of the model predictions using the geospatial data abstraction library (GDAL ver. 3.2.0) 62 implemented in the 'gdalUtils' package (ver. 2.0.3.2) 63 . We deliberately chose this resolution to reflect the mean distance that individual adult SLF have been observed travelling without human aid between emergence from eggs to egg deposition locations by females [64][65][66] . The fully annotated R script used to fit the Maxent model with additional details is included within Supplementary Materials.

Agent-based model description.
The agent-based model was designed to model SLF spread under multiple alternative scenarios in which we varied the following parameters: population intrinsic rate of growth (i.e., 0.25, 0.5, 1.0, and 1.5), random or non-random movement behavior, and the number, location, and distance to new satellite populations, representing increasing degrees of human-mediated movement effects (Table 1, Fig. 2A www.nature.com/scientificreports/ models in relation to the observed SLF spread to calibrate our model and tune this parameter within our models by comparing model estimated growth with observed growth rates in the introduced range. When we compare our intrinsic rates of growth (r) with a study that estimated SLF annual growth rate to be 5.47 using a stagestructured population model 67 we find that the discrete annual growth rate (λ) when converted to the continuous intrinsic rate of growth (i.e., ln(5.47)/mean generation time of 1 year) we get 1.609 which is very close to our upper bound of 1.5 used within our model simulations.
For each model scenario, we used standardized initial conditions that included a starting population size (N) of 1000 individual SLFs. We set all 1000 individuals to the same starting location within Berks County, PA (40.415240°N, − 75.675340°W) where the first detected population was found in the USA 25 . In our model, this location corresponded to the 4-km 2 cell with cartesian coordinates (438,197). To model different population growth rates, we used four intrinsic rates of growth (r) within a logistic growth model with the following equation: where N t+1 is the population size (N) at time step t + 1, N t is the population size (N) at time step (t), r indicates the intrinsic rate of growth, K is a deliberately high carrying capacity (set to 10 5 to ensure exponential growth), and V rand is a term to add stochasticity to the model; a value between -0.5 and 0.5 drawn from a random uniform distribution at each time step (t). We added additional biological realism to our population model by incorporating SLF adult survival probability, which was drawn at each model iteration from a random uniform distribution between 0.5 and 0.9. To incorporate varying survial probability, we multiplied the population size from the previous time step ( N t−1 ) by the adult survival probability to derive the total number of SLF ( N t ) that was then used in the logistic growth model (above): Our model simultaneously tracked each individual and all subsequent offspring produced by applying movement rules for each individual within the model. These rules governed the movement decisions of every individual at each time step (t) and are depicted in Fig. 2B. Based on findings from previous studies on the biology and general movement behavior of leafhoppers accounting for dispersal via wind 68,69 and SLF 23,65,66,70,71 , we set the cell size of our movement surface (see Maxent Model above for details) to 4 km 2 resolution (Fig. 2B). During each time step (t) of the model, each individual was forced to move to one of the 8 adjacent cells. In alignment with our current understanding of movement behavior of SLF nymphs and adults based on recent studies 64-66 , we did not allow individuals to remain within the cell where they were born from egg masses. Since each time step was equivalent to a 1-year period, which corresponds to the complete univoltine lifecycle of SLF 23 , each movement between two cells for each individual represents the combined lifecycle processes: (1) SLF emergence from egg masses, (2) nymphal maturation and walking dispersal movements, (3) adult flight movements, and (4) egg mass deposition by surviving females (Fig. 1A,B). Using the Maxent model-derived 4-km 2 resolution surface (Fig. 4), individual SLF moved to one of 8 neighboring adjacent cells at each time step (t), using either a random or non-random choice. In model simulations with random movement, each individual would move to a subsequent randomly selected cell. However, in models with informed dispersal movement, each individual chose Environmental stochasticity term (Var) Logistic growth model term used to add variation into estimated population sizes representing environmental stochasticity within the modeled system unif(− 0.5, 0.5) Mean adult survival probability (ϕ) Survival probability for adults that is randomly drawn from a uniform distribution each time step, ranging between 0.5 and 0.9 unif(0.5,0.9) Mean movement coefficient (q) Parameter governing individual movement decisions: either best choice (0.99) or random (0.01) choice decisions (0.01, 0.99) SD movement coefficient Standard deviation of movement coefficient to provide additional variation in behavior decisions among individuals 0.05

Human-mediated movements (h)
The maximum number of newly-established spotted lanternfly populations caused by human-mediated movement events per year (e.g., transport of egg cases or gravid adult females and subsequent successful establishment). We modeled different scenarios that consist of up to 0 (no human movement), or 3, 5, 7, and 10 new populations established at each time step drawn from a uniform distribution between 1 and either 3, 5, 7, or 10 per scenario www.nature.com/scientificreports/ www.nature.com/scientificreports/ the "best" or most preferable cell to move to, which was determined by rank ordering the cells by the probability of SLF occurrence (derived from Maxent models), and selecting the cell with the highest value. The influence of human-mediated movement (e.g., SLF hitchhiking on cars, trucks, trains, or ornamental plants), was modeled using a multi-step process (see Fig. 3), designed to represent the real-world situation where SLF egg masses are laid on a vehicle or gravid females hitchhike on vehicles from one location to a new location where eggs may be deposited or 1st instar nymphs may hatch.
First, during each time step (t), we determined whether human-mediated movements of SLF would occur or not using a stochastic process that is a function of population density. Generally, as SLF population densities increase, we would expect the probability of encountering human vehicles and effective spread events to increase. Hence, to allow the SLF population density to inform this component of the model, we generated a random value from a uniform distribution between 0 and 1, as well as random value from a uniform distribution between 0 and the mean normalized population density and compared these two values during each time step (t) to determine if a human-mediated movement occurred. In circumstances where SLF population density is low (e.g., 0.2 compared to runif(0,1)), there would be a lower probability that this stochastic decision making process would result in adding additional SLF to the individuals in a given cell. Whereas when population denisites are high (e.g., 0.9 compared to runif(0,1)), there is an increased probability of the human-mediated move occurring and additional individuals would then be added to the population size in a given cell. However, there is still the possibility that during some time steps (t), even if population densities are high, this model scenario does not guarantee to include human-mediated populations. In this way, our model captures how even when human-mediated movement of egg masses may occur, those eggs may not result in the functional movement, if for some reason the eggs are inviable, or do not successfully transition to emerged nymphal SLF. Since our model only recognizes these functional movements and potential colonization events, it naturally incorporates biologically meaningful factors related to egg mass survival and hatching probability.
For situations where human-mediated movements occurred, we first determined the maximum number of possible new populations that could occur during each time step t ( Fig. 3; Step 1). During this step, the model included additional stochasticity by randomly drawing between 1 and the maximum number of newly-established populations resulting from human-mediated movement from a uniform distribution for a given model scenario with the following number of movements per time step (i.e., 3, 5, 7, and 10). We subequently incorporated the influence of population density by multiplying the total number of movements per time step by the mean of www.nature.com/scientificreports/ the normalized density of the count of SLF individuals per cell (see Fig. 3; Steps 2-4) and rounded down to the nearest whole integer. In this manner, the number of newly generated SLF populations due to human-mediated movement is positively related to SLF population density. Here, we define a "new population" as a newly emerged (i.e., from egg masses) population of SLF colonizing a new cell that could be either currently occupied or unoccupied. This component of the model was implemented using a "for loop" where the number of iterations is equal to the number of new populations, so that it ultimately generates n new SLF populations, where n is the number of human-mediated movement events. We then determined the number of currently occupied cells to sample from (i.e., the cells where populations will originate from), by a process that randomly sampled between a minimum value of 1 and the total number of occupied cells (if less than 10,000), and in all other cases, 10,000 cells ( Fig. 3; Step 2). In Step 3 (Fig. 3), we used a varying number of 1 to 10 iterations (i) drawn from runif (1,10), to cycle through and generate a list of new candidate locations which were sampled from the pool of SLFoccupied cells previously generated in Step 2 (Fig. 3).
Once we generated a list of randomly sampled starting cells for potential movements, we then generated a corresponding list of destination cells by randomly selecting cells using a normal distribution with a mean of 0, implying no movement, and a standard deviation of 30 for both X and Y coordinates ( Fig. 3; Step 4). We chose to use a value of 30 (on the cartesian coordinate plane), which when multiplied by the width or height of the 4-km 2 cell resolution, indicates a move of 120 km. In this way, a random movement from a given cell could occur in any direction, and range between 0 and upwards of 169 km (along the diagonal). This would be akin to the largest movements being roughly the distance between Philadelphia, PA and Washington, D.C. Using the symmetrical normal distribution also provided the desired behavior resulting in more short distance movements while also allowing for movement of SLF across large distances, albeit less frequently. Given this process, resulting Euclidean distances from origin to destination locations were also normally distributed. For each of the newly generated destination cells, we then sampled from a random uniform distribution between 1 and 60 individuals that represent the number of new individuals emerging at the subsequent time step (t) at that given location ( Fig. 3; Step 5). Finally, of the new set of compiled potential destination cells generated within a given iteration (i), the most "suitable" location (i.e., the location where SLF is most likely to successfully colonize following a human-mediated movement) is selected as the cell with the greatest Maxent-model derived index of SLF habitat suitability ( Fig. 3; Step 6). In this way, the distance of a given human-mediated movement is informed by factors included with the Maxent model such as human foot print and TOH host plant distribution. All of the selected ending locations are then added to the overall SLF population (with corresponding cell locations) contributing to the new population size N t used to estimated N t+1 within the subsequent time step, t + 1 ( Fig. 3; Step 7). Each model scenario was run for a duration of 13 time steps (i.e., years t0 = 2013 to t13 = 2025), and each scenario was repeated for 1000 iterations for a total of 520,000 annual model simulations. All of our model scenarios were implemented using program R 51 .
Results from our model simulations were compiled for each scenario. The simulation output for a given model scenario included results for each year (time step) along with the cartesian coordinates of SLF-occupied cells and the cumulative count of total visits (by SLF individuals) per cell. These data were then converted to rasters in R using the 'raster' package 58 . We binarized these data so that all raster cell values were either 0 (not present) or 1 (present) representing cell occupancy, and then stacked the 1000 rasters per scenario to compute the average probability of occupancy for every cell within each scenario. From these, we set a threshold of high confidence to include only cells with 0.95 mean probability of SLF occurrence or greater to be included in model validation and evaluation steps.

Model evaluation.
To evaluate model performance among all scenarios we compared the set of model simulation results (i.e., rasters thresholded to include cells with 0.95 SLF probability of occurrence or above) with county-level polygons where SLF was present and absent within the spatial extent of Northeastern USA. The extreme thresholding was intended to use only the most conservative model results and reduce any influence from purely random movements and colonization events that could potentially influence our statistical comparison among model scenarios. To get observed county-level SLF occurrence, we used maps of confirmed SLF locations developed by Pennsylvania Department of Agriculture (2014-2017) and online maps maintained by the New York State Integrated Pest Management (2018-2021), which are continually updated (see https:// nysipm. corne ll. edu/ envir onment/ invas ive-speci es-exotic-pests/ spott ed-lante rnfly/). Even though counties are distinct in that some counties have established population outbreaks and are under quarantine, while others may have only single detections, we chose to treat all counties with at least one observed detection as having SLF present. We then used the extract() function within the 'raster' package 58 to determine which counties included agent-based model predicted cells. In situations where a model-predicted cell with SLF present overlapped a county polygon where SLF was observed within a given year, we counted that as a true positive detection. Modelpredicted cells where no SLF occur was counted as false positives. Any polygons that are not occupied by SLF and no model-predicted cells occurred was a true negative. To compare how well our model scenarios predicted the observed distribution of SLF at the county level, we computed the widely used model performance metrics: Precision, Recall, and F1 score 72  www.nature.com/scientificreports/ among years as well as among other key model parameters (i.e., intrinsic growth rate, random vs. non-random movement, or the number of maximum human-mediated movements).

Statistical analysis. We used model performance metrics
Recall (the number of true positive detections among relevant instances) and F1 scores from each model scenario as the response variables within generalized linear models. We initially fit a generalized linear model with mixed effects where we included the following independent additive variables: maximum number of human-mediated movements per year (i.e., 0, 3, 5, 7, or 10) × Year, movement type (non-random or random) × Year, intrinsic rate of growth × Year, and included model Scenario as a random effect using the 'lme4' package 73 . However, the model Scenario random effects term had negligible contribution to model variance for both Recall (var = 0.0003) and F1 (var = 0.0004), and so were subsequently removed from models which were fit using a standard glm(). We included the interactions among each of the fixed-effects model terms with Year to evaluate how Recall and F1 differed among years for each model variable. Since we only had observed county-level SLF occurrence between 2014 and 2021, we constrained our data set to include only these years. We summarized model results using 2-way analysis of variance with α = 0.05 and Tukey's post-hoc tests 74 using the 'agricolae' package 75 to determine pair-wise differences among categorical variable factor levels. All statistical models were implemented in program R 51 .

Results
The Maxent model performed successfully in ranking random background and presence points based on an "excellent" AUC value of 0.997, TSS value of 0.71 which accounts for both model sensitivity and specificity, and a fair kappa statistic of 0.404 53 . Maxent model-based estimates of the probability of SLF occurrence for 4-km 2 cell values ranged between 0.00000010 and 0.8062, and had a mean and SD of 0.032 ± 0.093 with the highest probabilities occurring in the mid-Atlantic region of the USA between New York City and Washington, D.C. (Fig. 4). Variable contributions (%) to the Maxent model-estimated probability of SLF occurrence were greatest for TOH distribution (59%), followed by the human footprint data (12%), temperature seasonality (10%; bioclimatic variable 4), precipitation in warmest quarter (8%; bioclimatic variable 18), and the precipitation in driest month (5%; bioclimatic variable 14). The large contributions of TOH distribution and human footprint (71% of variable contribution) to the Maxent model are evident in the strong correspondence to developed areas and road networks, which we visualized by applying a lower threshold to the full Maxent model values by plotting only cell values > 0.0009 (Fig. 4).  Fig. 5). While the core range of SLF expansion began in southeastern Pennsylvania and surrounding areas between 2014 and 2016, SLF was presumably transported via human-mediated movements (e.g., movement of egg masses, or hitchhiking nymphal instars or adults) to counties further from the core population than could be naturally traversed by SLF otherwise. Currently, counties where SLF have been detected furthest from the area of initial establishment include Dukes County, Massachusetts, Onslow County, North Carolina, Switzerland County, Indiana, and Oswego County, New York (Fig. 5).
Post hoc tests revealed differences in Recall among intrinsic growth rates, with r of 1.5 (0.83 ± 0.03) and 1 (0.82 ± 0.03) being similar, and r of 1.5 being greater than 0.25 (0.81 ± 0.03) which was greater than the intrinsic growth rate r of 0.5 (0.79 ± 0.03; Table 2 and Fig. 6A-D). For the number of maximum human-mediated movements per year, we found Recall which was similar among 3, 5, 7, and 10 maximum human-mediated movements per year (> 0.93 ± 0.01 in all cases) to be greater than 0 human-mediated movements (0.34 ± 0.04; Table 2 and Fig. 6A-D). For the F1 model performance metric, we found differences in main effects among years (F = 1346.6, df = 7, P < 0.0001), the maximum number of human-mediated movements per year (F = 11.05, df = 4, P < 0.0001), and between random (0.32 ± 0.02) and non-random (0.31 ± 0.02) movement (F = 1.41, df = 1, P < 0.001), but not among  (Fig. 6E-H). Post-hoc tests for the maximum number of human-mediated movements per year differed between 0 human-mediated movements per year (0.33 ± 0.02) and all other numbers of maximum human-mediated

Mean SE Group Mean SE Group
Year 2014  www.nature.com/scientificreports/ movements per year (i.e., 3, 5, 7, and 10) which were found to be similar with a mean F1 value that ranged between 0.30-0.31 (Fig. 6E-H). Additionally, we detected an interaction between Year and the maximum number of human-mediated movements per year (F = 352.5, df = 28, P < 0.0001) with a clear inflection point between 2017 and 2018 ( Fig. 6E-H).
To further compare predicted and observed SLF occurrence, we plotted the observed county-level annual spread of SLF and overlaid agent-based model predictions of SLF occurrence (i.e., only cells with 0.95 or greater probability of occupancy), using the model scenario with the greatest Recall and F1 values (i.e., models with intrinsic rate of growth (r) of 1.5) with non-random movement. We then generated maps for each year between 2016 and 2021 among the number of maximum human-mediated movements per year to visualize the difference between model scenarios with no human-mediated movement (0), and scenarios that included varying number of human-mediated movements per year (i.e., 3, 5, 7, and 10; Fig. 7).

Discussion
Our models suggest that the observed spread of the SLF is likely driven by human-mediated dispersal. After designing and running model simulations that controlled for SLF population growth rates, random vs. nonrandom movement behavior of individuals, and the amount of human-mediated dispersal events occurring per year, we found that in all scenarios where no human-mediated movement occurred, our models exhibited extremely poor performance (33% true positive predictions) in predicting the observed spread of the SLF. However, when we included human-mediated dispersal within model scenarios, even at extremely modest amounts (i.e., a maximum of up to 3 successful human-mediated movements per year), models increased to 92% true positive predictions in predicting the observed spread of the SLF. Furthermore, the pattern and distribution of both the predicted and observed SLF spread closely follow the distribution of roads and major highways which happen to also harbor the highest densities of TOH 76,77 . We recommend that based on our findings, future work be focused on directly measuring and tracking the movement pathway for the SLF (likely via "hitchhiking" egg masses and potentially adults) on cars, trucks, and trains. These empirical data could further inform predictive models, such as we developed here, to improve forecasting spread dynamics and the efficacy of early detection of the SLF into previously uncolonized areas. www.nature.com/scientificreports/ Similarities of model performance metrics and predicted SLF-occupied counties among scenarios with differing maximum numbers of human-mediated movements per year make sense when we compared the respective number of simulated movements per year showing a large degree of overlap among these distributions ( Fig. S1; see Supplementary Materials). This is a function of our model component where the maximum number of human-mediated movements per year is drawn from a uniform distribution (see Table 1). Additionally, the total number of new populations established due to human-mediated movements among scenarios also indicated that the maximum number of human-mediated movements per year were also similar ( Fig. S2; see Supplementary Materials).
Economic losses from SLF damage of crops, ornamental, and native plants within the State of Pennsylvania were estimated to be over $550 million under worst-case scenarios 27 . Three distinct scenarios were included in their economic analysis: (1) if the SLF is successfully limited to the 14 counties under SLF quarantine, (2) if the SLF expands to the 12 counties adjacent to the quarantine zone (a total of 26 counties), and (3) if the SLF expands statewide, occurring in all 67 counties throughout Pennsylvania 27 . To provide some context over how rapidly the SLF was actually spreading, by November of 2019, the year they also included the perhaps overly optimistic scenario (1), SLFs had spread to 29 counties within Pennsylvania even surpassing economic scenario (2). However, by fall of 2019, the SLF had also been detected in other states with 28 additional counties in Maryland, Delaware, Virginia, New Jersey, New York, and even as far as Connecticut. Given the observed rate of spread and the ability for the SLF to travel long distances (> 150 km), likely hitchhiking on cars, trucks, and trains, the economic losses from SLF in Pennsylvania, the mid-Atlantic USA, and the conterminous United States may be higher than is currently estimated. Since our models did not explicitly include wind-driven dispersal events based on our current knowledge of SLF behavior, we encourage future studies and models to explore how SLF spread may be influenced by less frequent high wind or storm events.
While understanding the extent and rate of SLF spread are both critical to predicting and hence employing preventive measures to manage SLF and reduce potential economic losses, effective management will be limited without having a clear understanding of the mechanisms underlying the spread dynamics. Our model predictions suggest that human-mediated dispersal is an extremely important driver of the spread of the SLF, and hence, we suggest future work focus on this component of SLF spread dynamics and forecasting. In a generalized modeling framework, research has explored how human-mediated dispersal accelerates the spread dynamics of invasive species 78 . Similar to a previous model that incorporated local spread dynamics (i.e., source [production] and destination [prob(establishment)]), our agent-based SLF spread model includes these same components that capture the effect of human-mediated dispersal of the SLF 78 . For instance, we used both "local" SLF population densities in a given cell to directly influence the likelihood of whether a human-mediated dispersal event might occur, where higher densities of SLF led to more potential human-mediated movements. Additionally, to capture the variability among potential destinations for human-mediated dispersal events within our model, we included a step whereby a set of potential destination locations were compared and then rank ordered, so that ultimately, destinations with higher human density (represented by the human footprint data 59 , would have a higher likelihood of being selected within our models. We also acknowledge that increasing rates of urbanization 79 should be taken into account, as the human foot print data used in our model is based on the landscape in 2009. We acknowledge that while our study successfully demonstrates the importance of human-mediated movement for SLF spread, our model was unable to predict which specific counties SLF will occupy at the leading edge of the spread. This was not an unexpected result given there was very little information provided to the model on the nature of the human-mediated movements beyond the upper bound of a 169-km movement and the Maxent surface providing information on relative suitablility of a given location for new population establishment. One indication of this limitation within our model is the lower values of model Precision due to false positive predictions (i.e., counties where simulations predicted the SLF to occur, but where they are not yet observed; Table S1). Our model would then, not be well-suited to make a prediction of the next county where the SLF is likely to occur. However, we do see value in using our model being able to make generalizable predictions as to the extent and rate of SLF spread into the future. These types of predictions can be informative to take preventive measures in areas where SLF may not yet occur that might include increased surveillance efforts, and pro-active removal or insecticidal treatment of TOH.
Given contemporary rates of globalization and urbanization, examples of human-mediated dispersal are numerous, occur worldwide 80 , and are an important component of anthropogenic global change 2,81-83 . Examples of invasive insects that have exhibited accelerated spread due to human-mediated dispersal that have garnered much attention within the United States include both the Asian longhorned beetle (Anoplophora glabripennis (Motschulsky) (Coleoptera:Cerambycidae)) and the emerald ash borer (EAB) (Agrilus planipennis Fairmaire (Coleoptera: Buprestidae)), among other cases of wood-boring insects 84,85 . In the case of EAB, the movement of contaminated firewood was implicated as a critical component of human-mediated dispersal [86][87][88] . We note that depending on the species, system, and dispersal mechanics, varying human-mediated dispersal patterns may approximate a variety of non-normally (non-Gaussian) distributed patterns 84 . Given the need to understand the dispersal pathway of EAB, human-mediated spread was incorporated in models using lognormal and Johnson distributions that included long-distance "jump" dispersal dynamics in an effort to predict the spread of EAB 84,89 . We found that only by incorporating similar jump dispersal dynamics into our SLF spread model were we able to accurately predict the observed spread of the SLF. Our study provides a novel contribution to better understanding the SLF and provides the first demonstration of how SLF spread is driven by human-mediated movement.
As rates of globalization and urbanization continue to increase 89 leading to biological invasions worldwide 90 , it is becoming increasingly important for us to predict biological invasions at multiple scales. For example, a study that compiled data from insects intercepted at international borders over a 25-year period (1995-2019) detected nearly 1.9 million interception events, which after evaluating species accumulation curves via rarefaction analysis, found that documented interceptions were only a small proportion of all likely introductions of www.nature.com/scientificreports/ non-native insects that occur globally 80 . At local and regional scales, the understanding of a species' degree of invasibility 7 and various invasion pathways 2 are also just as important to understand when attempting to model and disentangle multiple factors that can contribute to the spread and range expansion of introduced non-native species that become established. Here, we provide evidence suggesting local-and regional-scale spread dynamics of the SLF, a novel invasive insect in the United States, are largely driven by human-meditated dispersal. Our findings should motivate future research to further conduct empirically based studies on the oviposition behavior and human-mediated movement of SLF egg masses and adults (e.g., gravid females) to improve predictive models and inform management and control strategies to slow the future spread of the SLF.

Data availability
All of the generated simulation results, derived data, and R code used to implement the model simulations and validation, and statistical analyses are publicly available at: https:// github. com/ zachl adin/ Spott ed_ Lante rnfly. www.nature.com/scientificreports/